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DATA PREPARATION 
MULTIPLE REGRESSION PROGRAM 



Introduction: 

The data preparation program for the Multiple Regression 
Program serves three principle functions: 

(1) It allows the original data to be prepared off line 
in a simple fixed point format. 

(2) It permits data to be selected from the preliminary 
data tape in any desired fashion. Any variable on 
the preliminary data tape may be the dependent 
variable; many more variables may be put on the tape 
than are to be used in a single regression. A great 
deal of flexibility in data preparation is thus 
available. 

(3) Perhaps most important, almost any function of the 
variable or combinations of variables on the pre- 
liminary data tape can be easily computed. Simple 
sums or products of variables, as well as logs, 
exponentials, sines and cosines or combinations of 
these functions can be computed. 

The program uses a language similar to RAFT for formula 
compilation, however, the compiler produces a fairly efficient 
machine language program for each function typed in; thus, 
running time is kept to a minimum. 



DATA PREPARATION 
MULTIPLE REGRESSION OPERATING INSTRUCTIONS 

I- Input Routine General 

A. Punch data on versa- tape using decimal interger 
cartridge 

B. All data'must be scaled 10" 5 i.e. (XXXXX.XXXXX) 

C. Each experiment should begin with L 3000 

D. Each experiment should end with L 0000 "S" 

E. A maximum of 32 data points including the dependent 
variable but not including the constant may be 
included in each experiment. 

F. There is no limit to the number of experiments. 

G. The variables are numbered and referred to as 
0i - 32 respectively. 

II. Input of the Headings 

A. Three numbers must be typed in for each problem. 
These are: 

1. The number of Independent variables (including 
the constant) N 

2. The number of experiments M 

3. An Identification number 

B. These numbers are entered from the typewriter as 
fixed point intergers. 

C. The identification number may be either positive 
or negative. It controls the treatment of the 
constant in the regression equation (See Regression 
Program Operating Instructions IV) 

III. Equation Input 

A. Input is similar to RAFT except the beginning symbol 
for FCA ( ) may be omitted. 

B. Each equation should be ended with a TAB 

C. Spaces and Figure shifts are ignored in equations 

D. Three temporary storage locations 90, 91 and 99 are 
available plus a pseudo accumulator 99. 

E. The following functions are available as one digit 
codes: 

(or separator) 



1. 


FCA 


' 9 


2. 


FAD 


+ 


3. 


FSB 


«■> 


4. 


FMP 


• 


5. 


FDV 


/ 


6. 


FSQ 


? 


7. 


PWR 


1 


8. 


FST 





- 2 - 



F. Several subroutines are also available 



1. 


L/S I 


2. 


L/S E 


3. 


L/S L 


4. 


L/S G 


5. 


L/S X 


6. 


L/S S 


7. 


L/S C 



invert (reciprocal) 



x 



e 

Ln (base e) 

Log (base 10) 

10 x 

Sine 

Cosine 

G. Constants may be entered by enclosing them in 

parentheses — e.g. (+2.5) etc. 
H. If a subroutine I, E, etc. is used it must either 

be separated from the rest of the equation by an 

arithmetic code, + , - etc. or by a comma. 
I. The maximum number of characters in an equation 

is 256. (Figure Shifts 6c Spaces not included) 

IV. To Operate Program 

A. Fill program tape - All tabs must "be set 

B. Press start "1" with sense switch "B" up 

(1) Computer types Variables : 

Type in No. of independent variables C/R 

(2) Computer types Observations : 

Type in No. of experiments or observations C/R 

(3) Computer types IDENT No.: 

Type an identification number C/R 

C. In the above steps no comma sign or period need be typed. 

D. The computer will type EQUATIONS then Y 00 followed by 
N X equations. For N eauals 4, 

The computer will type: (Note X01 always equals one) 
Y 00 

X 01 (+1.0) 
X 02 
X 03 
X 04 

E. As each equation number is typed, the appropriate 
equation should be entered, followed by a tab. 

(1) Examples X02 etc. - 

(a) 03 times 05 03.05 T/B 

(b) Ln 3.5 div. by 01 L/S L(3.5)/01 T/B 

(c) 01 div. by (02 plus 03) 02+03, L/SI99.01 T/B. 

(d) Ln (01-05) cjiv. by (01-05) 01-05: 90.L/S L99/90 T/B 

Si £ g is Sir* $■§&* to ffl K B i2). 9 

T/B 
(g) 05 05 T/B 

(2) Typing line feed will erase-the current equation 

or start ,f l fl with SSB down will also erase an error. 
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I 7 . After the final equation has been entered the computer 
will type PUNCH DATA. The preliminary data type punched 
on the versa tape should be placed in the photo reader. 

G. Press start 2 with sense switch G up. Computer will 
read preliminary data tape, compute the variable^, and 
punch a floating point tape in proper format for 
regression program. 

H. After punching is finished END will be typed out. 

Remove tape from punch and proceed to' Regression Program, 

I, If it is desired to list the floating point data tape, 
load it in photo reader and press start 2 with sense 
switch ,f C ri down. 

J. If it is desired to list the preliminary fixed point 
data tape, load it in the photo reader and press start. 
2 with sense switch "3" down. 

(1) Computer types I nput Lister , then Variable s: 

Type in No. of independent variables C/K 

(2) Computer types Observations : 

Type in No. of experiments or observations C/R 
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Multiple R egressi on Prog ram Desc ription 

I . Introduc tion 

Multiple Regression is a familiar statistical technique with a variety 
of uses. It may be used in the usual fashion to find the correlation 
between a single dependent variable and several independent variables 
or simply as a curve fitting program. Mathematically the problem may 
be stated as follows: 

Given t sets of observations having one dependent variable Y and 

n independent variables Xj_, X2, ...X n fine the coefficients cq, c^, C2, 

...c n in the equation 

Y = co + ciXi + C2X2 + . . . + c n X n 

which reduce the sum of squares of Y the most. That is 

t 

/ (Y observed - Y predicted) « minimum 
t=l 

II. The metho d 

The method used is essentially the stepwise multiple regression pro- 
cedure given by Efroymson-. The computational procedure is as 
follows. The data is read in one experiment at a time and the following 
matrix is computed, each element being a sum of t cross products. 



i\ 







JO 
2 



Let the Matrix M 1 XiY 

2 X 2 Y 

3 X3Y 



XjY X£ X3V 



XlXi x 2 x x x 3 x x 

X^X2 X2X2 X0X2 be represented!^ 



T 
C 



Then inverting the matrix X by a series of linear transformations will 
transform the column C into the multiple regression coefficients. 

This inversion process is carried out stepwise by successively pivoting 
on diagonal elements of the X matrix. Each step constitutes one 
iteration. Pivoting on the diagonal element corresponding to a variable 
not in the solution brings this variable into the solution. Pivoting 
on the diagonal element corresponding to a variable already in the 
solution causes this variable to be dropped. 

At each iteration an F test is made to determine which variable to 

add to most greatly improve "the goodness of fit." After several variables 



have "been added one of the previously added variables may become statis- 
tically insignificant and will then be dropped. This stepwise procedure 
assures that only significant variables are included in the final solution. 

An additional algorithm is included in the program as an option, which 
will cause all the variables to be added before any statistical testing 
takes place . 

Unlike many regression programs this regression is not automatically 
fitted about the mean. Instead an extra column is carried in the matrix 
for the constant. This permits handling the constant just like any other 
independent variable, adding or dropping it at will. 

m» S ymbols and Definitions 

a^ = any element in the matrix M 

a kk " a P* v0 " t element in the matrix M 

F to enter = F value for entering a variable 

F to drop as F value for dropping a variable 

i ■ row number of the matrix 

j = column number of the matrix 

k = index of a pivot element in the matrix 

M « the matrix described in section II 

N = number of independent + dependent variables 

n = number of experiments 

s «= standard error of dependent variable 

V minus « variance increase caused by dropping the least significant 

variable in the solution 

V plus = variance decrease caused by adding the most significant variable 

not in the solution 

Xj[ = the ith variable 

xi = the mean of the ith variable 

V = element a Q0 in matrix M 
« degrees of freedom 



1. Mean of variable 

2. Standard error of a variable 

3. Partial correlation coefficient for 
variables Xj_ and x< 



k. General algorithm for pivoting 
on element aj^ 



The new 



E*i 

"■' h 



V TT=1 



«_ 



Y^Cxi-Xi)^ ^(x r xj) 



r " a ij a kk- a lk a kj if i^k; j& 



a kk 



a kk 



a kk 



if i=k> j^k 



if i^k^ j*^ 



if i"J«k 



V- a kk 



5. Variance change caused by- 
pivoting on element a^ 

6. Degrees of freedom 

7. F test for dropping a variable 

8. F test for entering a variable 

9. Standard error of dependent variable 



a ok a ko 
a kk 



■ N - number of independent 
variables in solution 

F to drop I V minus) # 

Y 

F to enter (V plus)(gh 
Y - V plus 



*VF 



10. Confidence limit of the regression ■ t Sy.ySJJ 
coefficient of variable x^ 



IV. Program Description 

The program occupies twenty channels of memory from 00 through 23- 
Channels 2^ - 33 are available for more program. Channel 3**- contains 
a t table for 95$ confidence limits, channel 35 contains the names of 
the variables, $6 is used for temporary storage and the matrix M is 
stored in channels 37 through j6. 

The computation is done floating point. Running time depends on 
the size of the problem and output desired. For a typical problem with 
l6 variables the time is about 20 seconds per experiment to compute the 
matrix and about 25 seconds per iteration. 

It is recommended that the data be scaled so that the numbers are 
between 0-1 and 10. With this scaling a l6 variable problem has been 
run over 30 iterations with no noticable round off error to five digit 
accuracy . 

The program contains only one error halt. If a pivot element becomes 
zero or negative the program will type out Pivot Element Too Small and halt 
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REGRESSION PROGRAM 
Multiple Regression Operating Instructions 

I. Tape Format 

A. Tape format is Alphabetic, with the experiments in order. 

B. Each block begins with an "F" and ends with an "S". 

C. A leading block must head each problem tape. This block contains 
3 words in the following order (fixed point @ B39)' 

1. The No. of independent variables including the constant 

2. The No. of experiments 

3. An Identification No. 

D. Each experiment must have the variables on tape in the following 
order (floating point) 

E. A total of 31 independent variables including the constant (one) 
is the maximum allowable. 

F. There is no limit to the number of experiments. 
II. The Sense Switch Settings 

A. Sense Switch B on causes the means standard deviations and partial 
correlation coefficients to be typed out. 

B. Sense Switch C on causes the entire matrix to be inverted first 
before any statistical testing takes place. 

C. Sense Switch D on causes all output to be in Floating Point; 
otherwise the output is Fixed Point 10 digits scaled 10"5. 

III. The Start Switches 

A» Start one clears memory, sets the F levels to constant values 
(around 4.5) and starts reading the Input tape. 

B. Start tw o allows new F values to be entered from the typewriter 
and begin iterating. 

C. Start three same as start one except computer halts to allow F 
values to be entered from typewriter. 



IV. Control of the Constant 

A. The program is set up with three possibilities concerning the 
constant (variable XOl) . These are controlled by the sign 
of the ID Number. 

1 . If the ED No . is positive the comjftrter will add or drop 
the constant according to statistical tests . 

2. If the ID No. is negative the choice of whether or not 
the constant is to be added is up to the operator. If 

the ID No. is negative the computer will halt after reading 
in the data tape and type: 

SET CONSTANT type , + 1. tab s or 

, + 0. tab s 

Typing a one will force the constant into the answer, typing 
a zero will leave it out. 

3« The sign of the ID number may, of course, be changed at any time 
to obtain the desired result. The ID number is stored in location 
7702 octal. 

V. Forcing a Solution 

A. It is sometimes desired to force a given solution to a problem. 
Provision to do this has been provided in the program. 

B. To force a given solution 

1. Transfer to Location 0000.0 (Type L 00000 Enter on console) 

2. Computer will type ENTER REQUIRED SOLUTION, then CHANGE 
VARIABLE NO. type , + XX. Tab s 

(a) If variable XX is in the solution, it will be forced out 

(b) If it is not in the solution, it will be forced in 

3. Computer will perform the desired ojperation and return to the 
point where CHANGE VARIABLE NO. is typed out. 

k. typing a zero for the variable (, +0. Tab s) will cause the 
computer to type out MULTIPLE REGRESSION REQUIRED SOLUTION 
then enter the normal output routine . 

VI. The Output 

A. The program always outputs the following information. 
1. The Problem Identification No. 



VI. The Output (continued) 

2. As each variable is added or dropped the computer outputs: 

(a) The Name of the variable 

(b) The Sum of Squares of Y before *adding this variable 

(c) The Standard Error of Y 

3« When the Optimum solution is reached, the Computer types 
out: 

(a) The names of the Variables in the solution 

(b) The initial and final Sums of Squares and Standard Errors 
of the dependent variable Y 

(c) The F ratio for the regression 
k. The 3?ype-out continues with: 

(a) The names of the Variables in the solution 

(b) The regression coefficients of those variables 

(c) The 95$ confidence limits of the coefficients 

(d) The amount the sum of squares of Y would be increased 
if the variable were added to the solution. 

B. As optional additional information the following can be obtained 
(see sense switch settings and operating instructions) 

1. The means and standard deviations from the mean of each 
variable . 

2. The partial correlation coefficients of the variables. 

3« The observed and predicted value of the dependent variable 
for each experiment and the deviation, and a recomputed 
value for the standard error of Y calculated from the 
original data. 



VII. To Operate Program 

A. Pill Program - All tabs must be set 

B. Load Floating Point data tape in photoreader and press Start 1 or 3 

1. If Start 1 is pressed, the computer will read the leading 
"block type out the Id No., read the rest of the data and 
"begin iterating 

2. If Start 3 is pressed, the computer will halt and type 

F to ENTER Type , + X.X Tab S 
F to DROP Type , + X.X Tab S 

Computation will continue as in the case where Start 1 was 
pressed. 

3. When entering F levels the comma (,.) and sign (+) must be 
typed. The F values are entered as mixed numbers; up to 
10 digits may be entered both before and after the decimal 
point . 

C. Computer will add and drop variables as determined by the sense 
switch settings and F levels typing out intermediate results 

as it does so. 

D. After an optimum solution has been reached, the computer will 
output the answers and halt 

E. To run observed vs Predicted Results press Start 

1. Computer will type Load Data Tape and halt. 

2. Load the data tape in the phdtoreader and press Start 

3. Computer will read tape & output the observed predicted 
values and deviation of the dependent variable. 

4. When all variables have been computed the computer will type 
out the recomputed standard error. 

VIII. Example Problem 

The following is a sample problem taken from Duncan' 2 ', Quality Control 
and Industrial Statistics 9 which illustrates several of the options 
in the program. 

The problem has one dependent variable, four independent variables, 
including the constant and twenty experiments. A regression was run 
using F levels of k.O. This resulted in three of the variables entering 
the solution . This solution is shown on Page 5 of the sample problem 
type-out. The remaining variable was then forced into the solution. 
This solution is shown on Page 6 of the sample problem type-out. 
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Experiment 










No. 


Y00 


X02 


X03 


X04 


1. 


9.9 


8.5 


7.6 


4.4 


2. 


9.3 


8.2 


7.8 


4.2 


3. 


9.9 


7.5 


7.3 


4.2 


4. 


9.7 


7.4 


7.2 


4.4 


5. 


9.0 


7.6 


7.3 


4.3 


6. 


9.6 


7.4 


6.9 


4.6 


7. 


9.3 


7.3 


6.9 


4.6 


8. 


13.0 


9.6 


8.0 


3.6 


9. 


11.8 


9.3 


7.8 


3.6 


10. 


8.8 


7.0 


7.3 


3.7 


11. 


8.9 


8.2 


7.1 


4.6 


12. 


9.3 


8.0 


7.2 


4.5 


13. 


9.4 


7.7 


7.6 


4.2 


14. 


7.5 


6.7 


7.6 


5.0 


15. 


8.4 


8.2 


7.0 


4.8 


16. 


9.1 


7.6 


7.6 


4.1 


17. 


10.0 


7.4 


7.8 


3.1 


18. 


9.8 


7.1 


8.0 


2.9 


19. 


10.1 


7.0 


8.3 


3.9 


20. 


8.0 


6.4 


7.9 


3.8 



Y00 s dependent variable 

X02 - X04 = independent variables 



MULTIPLE REGRESSION SA MPLE PROBLEMS 



DATA PREPARATION 



VARIABLES: 4 
OBSERVATIONS: 20 



Data Preparation Start 1 



I DENT NO: 



EQUATIONS: 



YOO 

xoi 

AQ2_ 

X03 

X04 



01_ _ 

T+noy 

02 



0- 

0* 



PUNCH DATA 



Load Preliminary Data Tape 



N 001 

00 

"01 



Listing of Compiled Floating Point 



* 



•90000 
7DD0DCT 

8.50000 



Tape 



Start 2 with SSC down. 




7760000" 

4. 4oooo 



N 002 
"00 
01 
'02 
0^ 



9.30000 

1 .00000 

"872000S" 

7.80000 

4.20000 



02 



N 003 

00 

01 
02 



9.90000 



1 .00000 
7.50000 
7.30000 
4.20000 



N 004 



00 
01 



9.70000 
1.60000 
7.40000 
". 20000 
.40000 



02 



4 



/ 



N 005 

"00" 9.00000 

01 1 .00000 

"02 7jSuooo* 

0* 7.30000 

• 30000 " 



a— 1 



~M 006 """ " 

00 9 » 60000 

01 1.00000 
02 7 JL 40000_ 

03 """ 6.90000 
_04 k. 6ooog_ 

N 007 

00 9.30000 

01 1 .00000_ 

02 """ ' 7.30000 
03 5.90000_ 

'04'"" """ T. 60000 

;m 008 

_oo 13 . 00 000 

01 1 . obdoo 

02 _9,60000_ 

03 0.00000 

.04 ^.60000 

_n 009 ~—— 

06 1 1 .boooo 
01 l^oooqo, 

"02" 9.3OOOO 

_03 7.80000 

04 3.6000c 

Truro 

8 . 80000 

"01 ~~TTOOOOO" 

02 7.00000 

"03 p.30000 

04 3.7OOOO 

_n on ^ 

00 8790000 
_gi 1 .00000 

02 6.20000 

C3 140000 

04 4760000 

~h"012 

9.30000 

01 TTooooo 

02 8.0 0000 

03 7.20000 

04 4.50000 



n 013 . 

00 9_.jf00P0_. 

di f .ooocjo 

02 7.70000 

0^ 7.60000 



04 4.20000 



fc! 



JM..01JL 



oo 7.50000 

01 1.00000 



02 6.70000 

03 7^00000. 



04 5.00000 



N 01 5 

DQ 8.^0000 



01 1.00000 

02 8.20000 



03 7.00000 

04 4.80000 



Jbl 01 6 



00 9.10000 

01 l^QQOOO. 

02 7. 50000 
0^ 7 . 60000 



04 4.10000 



i: 



N 017 

00 10.00000 

01 1 .00600 

02 7.^0000 



7^0000 



m 3 t1QQQ0 



N 018 ^ _ 

do 9780000 

01 1.00000 



02 771 0"CX)Xr 

8.00000 



2.90000 



n 019 

00 10. 1 opoo 

61 i. (Sodoo 

02 LtPppoo 

03 8730000 

04 3*SQ000_ 



Rjoaa 



"06 "8.00000 

HI UQOQQQ 



02 6.40000 



§ 



END 



3.00000 



F TQ_ ENTER 
T" TO "DROP 



",+4.o 



Fi ll R e gression Program 



Start 3 with Sense Switch 
B Down 



MULTIPLE REGRESSION PROBLEM NUMBER 



2.00000 



MEANS AND STANDARD DEVIATIONS 



YOO 

X01_ 

X02 

-XQ3. 

xoT 



MEANS 

9.5S000 

1.00000 



7.70500 

7,51000 



STAND, DEV. 

1 .201W" 
.00000 
7800T2 



im 



25(50" 



P ARTIAL CORRELATION COEFFICIENTS 



YOO VERSUS 

_XOJ ,_ 

X02 .72940 
_XQ 
XO 



>4^ ^48723 



X01 VERSUS 



X02 

XQ3L 
X04" 



X0 2 VERSUS _ x ____ 

xo3 .04096 



X04 - 



. 0^51 9 



X03 VERSUS _ 

xo4 - .6657 7 



VAR 
AD D X0 2 
ADD tiW 
ADD X01 



rt% SUM SQ 
1 847.b600Q 
I3.O&36 

10.646^2 



ST ERROR 
9.61161 



702575" 
.76900 



MULTIPLE REGRESSION OPTIMUM SOLUTION 



DEGREES OF FREEDOM 17.00000 



SUM SO — 
l847.6bOO( 

— 6.98^ 



ST ERROR"" 
INITIAL 9.63161 
.6kUXc 



TTTOTr~" 
F RATIO 



02" 



VARIABLES IN SOLUTION 



NAME COEFFICIENT 

_xoi 5^8080 

X02 

xoL- 



1.07062 
JUQ15ZL. 



CONFIDENCE 

3. 87^1 6 
22. 



• 307! 



SUM SC 

3^ 



l 



PRTD" 
>6l27 



;1 

iO- 



I 



VARIABLES NOT IN SOLUTION 


NAME SUM SQ RED 


X03 • 0^536 


Press Start 


LOAD DATA TAPE 


Load Data Tape 
and Press St-a-rf- 


OBSERVED VS PREDICTED RESULTS 



£XP_ 



__OOQL 
0002 

,0003. 
OOOT 

_-000i 
0006 

-0007. 
0005 
0009 



OBSERVED 



PRED I CTED 



DEVIATION 



3 .£00.00. 



JOOOO 
). 90000. 



a. 




V 



9.70000 

-2*00000 

9.66600 

~)0000 

^Joooo 



Wi 



'.2 




12.1 

11 ,7^090 
9-21691 



:i6o_8. 

.857*92 

,01910 




0010 

001 1_ 

00T2 

oon 



..80000 
!±909QO 



m 



001 
001 



001 
001 



9.30000 

9 . 40000 



10000 
^0000 



h 



IS 



9.10000 

10.00000 

"9^0000 

10 .10000 

"8". 00000 



9-3Q434 
9.%9^ 

10.13656 
9.0137 6 
8.47296 



0015 

-0019 
0020 



__ST_£BRQR. 



, 6^102 



Set Location to 0000.0 



ENTER REQUIRED SOLUTION 



and Press Start 



CHANGE VARIABLE NO ,+3. s 



VAR SUM SQ ST ERROR 
APP X03 6.33526 .64102 



CHANGE VARIABLE NO ,+Q. s 



MULTIPLE REGRESSION REQUIRED SOLUTION 



DEGREES OF FREEDOM 1 6.00000 



ST ERROR SUM 
I NITIAL 



S.gnft 1&7 f 66000, 

3m eL9399 ° 



FINAL 
F RATIO 



VARIABLES IN SOLUTION 



NAME COEFFICIENT "ONFlDENCE SUM SQ KtU 



■X01 ^.93295 10.90709- .ggffiT 

X02 1 .06919 •40060 13.gPg75 

JLQ3 .I6?6j 1. 07386 «Oy*g 

X04^ .93604 .78536 2.7b922 



VARIABLES NOT IN SOLUTION 



"NAME SUM SQ RED 



LOAD DATA TAPE 



